Potential downsides of using explicit probabilities
In various communities (including the EA and rationalist communities), it’s common to make use of explicit, numerical probabilities.[1]
At an extreme end, this may be done in an attempt to calculate and then do whatever seems to maximise expected utility.
It could also involve attempts to create explicit, probabilistic models (EPMs), perhaps involving expected value calculations, and use this as an input into decision-making. (So the EPM may not necessarily be the only input, or necessarily be intended to include everything that’s important.) Examples of this include the cost-effectiveness analyses created by GiveWell or ALLFED.
Most simply, a person may generate just a single explicit probability (EP; e.g., “I have a 20% chance of getting this job”), and then use that as an input into decision-making.
(For simplicity, in this post I’ll often say “using EPs” as a catchall term for using a single EP, using EPMs, or maximising expected utility. I’ll also often say “alternative approaches” to refer to more qualitative or intuitive methods, ranging from simply “trusting your gut” to extensive deliberations where you don’t explicitly quantify probabilities.)
Many arguments for the value of using EPs have been covered elsewhere (and won’t be covered here). I find many of these quite compelling, and believe that one of the major things the EA and rationalist communities get right is relying on EPs more than the general public does.
But use of EPs is also often criticised. And I (along with most EAs and rationalists, I suspect) don’t use EPs for most everyday decisions, at least, and I think that that’s probably often a good thing.
So the first aim of this post is to explore some potential downsides of using EPs (compared to alternative approaches) that people have proposed. I’ll focus not on the case of ideal rational agents, but of actual humans, in practice, with our biases and limited computational abilities. Specifically, I discuss the following (non-exhaustive) list of potential downsides:
Time and effort costs
Excluding some of one’s knowledge (which could’ve been leveraged by alternative approaches)
Causing overconfidence
Underestimating the value of information
The optimizer’s curse
Anchoring (to the EP, or to the EPM’s output)
Causing reputational issues
As I’ll discuss, these downsides will not always apply when using EPs, and many will also sometimes apply when using alternative approaches. And when these downsides do apply to uses of EPs, they may often be outweighed by the benefits of using EPs. So this post is not meant to definitively determine the sorts of situations one should vs shouldn’t use EPs in. But I do think these downsides are often at least important factors to consider.
Sometimes people go further, linking discussion of these potential downsides of using EPs as humans, in practice, to claims that there’s an absolute, binary distinction between “risk” and “(Knightian) uncertainty”, or between situations in which we “have” vs “don’t have” probabilities, or something like that. Here’s one statement of this sort of view (from Dominic Roser, who disagrees with it):
According to [one] view, certainty has two opposites: risk and uncertainty. In the case of risk, we lack certainty but we have probabilities. In the case of uncertainty, we do not even have probabilities. [...] According to a popular view, then, how we ought to make policy decisions depends crucially on whether we have probabilities.
I’ve previously argued that there’s no absolute, binary risk-uncertainty distinction, and that believing that there is such a distinction can lead to using bad decision-making procedures. I’ve also argued that we can always assign probabilities (or at least use something like an uninformative prior). But I didn’t address the idea that, in practice, it might be valuable for humans to sometimes act as if there’s a binary risk-uncertainty distinction, or as if it’s impossible to assign probabilities.
Thus, the second aim of this post is to explore whether that’s a good idea. I argue that it is not, with a potential caveat related to reputational issues.
So each section will:
outline a potential downside of using EPs
discuss whether that downside really applies more to using EPs than to alternative approaches
explain why I believe this downside doesn’t suggest one should even act as if there’s a binary risk-uncertainty distinction
Epistemic status: This is basically meant as a collection and analysis of existing ideas, not as anything brand new. I’m not an expert on the topics covered.
Time and effort costs
The most obvious downside of using EPs (or at least EPMs) is that it may often take a lot of time and energy to use them well enough to get better results than one would get from alternative approaches (e.g., trusting your gut).
For example, GiveWell’s researchers collectively spend “hundreds of hours [...] per year on cost-effectiveness analysis”. I’d argue that that’s worthwhile when the stakes are as high as they are in GiveWell’s case (i.e., determining which charities receive tens of millions of dollars each year).
But what if I’m just deciding what headphones to buy? Is it worth it for me to spend a few hours constructing a detailed model of all the factors relevant to the question, and then finding (or estimating) values for each of those factors, for each of a broad range of different headphones?
Here, the stakes involved are quite low, and it’s also fairly unlikely that I’ll use the EPM again. (In contrast, GiveWell continues to use its models, with modifications, year after year, making the initial investment in constructing the models more worthwhile.) It seems the expected value of me bothering to do this EPM is lower than the expected value of me just reading a few reviews and then “going with my gut” (and thus saving time for other things).[2][3]
Does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities?
Not at all. In fact, I’d argue that the headphones example is actually one where, if I did spend a few hours doing research, I could come up with probabilities that are much more “trustworthy” than many of the probabilities involved in situations like GiveWell’s (when it is useful for people to construct EPMs). So I think the issue of time and effort costs may be quite separate even from the question of how trustworthy our probabilities are, let alone the idea that there might be a binary risk-uncertainty distinction.
Excluding some of one’s knowledge
Let’s say that I’m an experienced firefighter in a burning building (untrue on both counts, but go with me on this). I want to know the odds that the floor I’m on will collapse. I could (quite arbitrarily) construct the following EPM:
Probability of collapse = How hot the building is (on a scale from 0-1) * How non-sturdily the building seems to have been built (on a scale from 0-1)
I could also (quite arbitrarily) decide on values of 0.6 and 0.5, respectively. My model would then tell me that the probability of the floor collapsing is 0.3.
It seems like that could be done quite quickly, and while doing other things. So it seems that the time and effort costs involved in using this EPM are probably very similar to the costs involved in using an alternative approach (e.g., trusting my gut). Does this mean constructing an EPM here is a wise choice?
Intuitive expertise
There’s empirical evidence that the answer is “No” for examples like this; i.e., examples which meet the “conditions for intuitive expertise”:
an environment in which there’s a stable relationship between identifiable cues and later events or outcomes of actions
“adequate opportunities for learning the environment (prolonged practice and feedback that is both rapid and unequivocal)” (Kahneman & Klein)
In such situations, our intuitions may quite reliably predict later events. Furthermore, we may not consciously, explicitly know the factors that informed these intuitions. As Kahneman & Klein write: “Skilled judges are often unaware of the cues that guide them”.
Klein describes the true story that inspired my example, in which a team of firefighters were dealing with what they thought was a typical kitchen fire, when the lieutenant:
became tremendously uneasy — so uneasy that he ordered his entire crew to vacate the building. Just as they were leaving, the living room floor collapsed. If they had stood there another minute, they would have dropped into the fire below. Unbeknownst to the firefighters, the house had a basement and that’s where the fire was burning, right under the living room.
I had a chance to interview the lieutenant about this incident, and asked him why he gave the order to evacuate. The only reason he could think of was that he had extrasensory perception. He firmly believed he had ESP.
During the interview I asked him what he was aware of. He mentioned that it was very hot in the living room, much hotter than he expected given that he thought the fire was in the kitchen next door. I pressed him further and he recalled that, not only was it hotter than he expected, it was also quieter than he expected. Fires are usually noisy but this fire wasn’t. By the end of the interview he understood why it was so quiet: because the fire was in the basement, and the floor was muffling the sounds.
It seems that the lieutenant wasn’t consciously aware of the importance of the quietness of the fire. As such, if he’d constructed and relied on an EPM, he wouldn’t have included the quietness as a factor, and thus may not have pulled his crew out in time. But through a great deal of expertise, with reliable feedback from the environment, he was intuitively aware of the importance of that factor.
So when the conditions for intuitive expertise are met, methods other than EPM may reliably outperform EPM, even ignoring costs in time and energy, because they allow us to more fully leverage our knowledge.[4]
But, again, does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities? Again, not at all. In fact, the conditions for intuitive expertise would actually be met precisely when we could have relatively trustworthy probabilities—there have to be fairly stable patterns in the environment, and opportunities to learn these patterns. The issue is simply that, in practice, we often haven’t learned these probabilities on a conscious, explicit level.
On the flipside, using EPMs may often beat alternative methods when the conditions for intuitive expertise aren’t met, and this may be particularly likely when we face relatively “untrustworthy” probabilities.
Relatedly, it’s worth noting that us feeling, in some situation, more confident in our intuitive assessment than in an EPM doesn’t necessarily mean our intuitive assessment is actually more reliable in that situation. As Kahneman & Klein note:
True experts, it is said, know when they don’t know. However, nonexperts (whether or not they think they are) certainly do not know when they don’t know. Subjective confidence is therefore an unreliable indication of the validity of intuitive judgments and decisions.
[...] Although true skill cannot develop in irregular or unpredictable environments, individuals will some times make judgments and decisions that are successful by chance. These “lucky” individuals will be susceptible to an illusion of skill and to overconfidence (Arkes, 2001). The financial industry is a rich source of examples.
Less measurable or legible things
An additional argument is that using EPs may make it harder to leverage knowledge about things that are less measurable and/or legible (with legibility seeming to approximately mean susceptibility to being predicted, understood, and monitored).
For example, let’s say Alice is deciding whether to donate to the Centre for Pesticide Suicide Prevention (CPSP), which focuses on advocating for policy changes, or to GiveDirectly, which simply gives unconditional cash transfers to people living in extreme poverty. She may decide CPSP’s impacts are “too hard to measure”, and “just can’t be estimated quantitatively”. Thus, if she uses EPs, she might neglect to even seriously consider CPSP. But if she considered in-depth, qualitative arguments, she might decide that CPSP seems a better bet.
I think it’s very plausible that this is a sort of situation where, in order to leverage as much of one’s knowledge as possible, it’s wise to use qualitative approaches. But we can still use EPs in these cases—we can just give our best guesses about the value of variables we can’t measure, and about what variables to consider and how to structure our model. (And in fact, GiveWell did construct a quantitative cost-effectiveness model for CPSP.) And it’s not obvious to me which of these approaches would typically make it easier for us to leverage our knowledge in these less measurable and legible cases.
Finally, what implications might this issue have for the idea of a binary risk-uncertainty distinction? I disagree with Alice’s view that CPSP’s impacts “just can’t be estimated quantitatively”. The reality is simply that CPSP’s impacts are very hard to estimate, and that the probabilities we’d arrive at if we estimated them would be relatively untrustworthy. In contrast, our estimates of GiveDirectly’s impact would be more trustworthy. That’s all we need to say to make sense of the idea that this is (perhaps) a situation in which we should use approaches other than EPs; I don’t think we need to even act as if there’s a binary risk-uncertainty distinction.
Causing overconfidence; underestimating the value of information
Two common critiques of using EPs are that:
Using EPs tends to make one overconfident about their estimates (and their models’ outputs); that is, it makes them underestimate how uncertain these estimates or outputs are.[5]
Therefore, using EPs tends to make one underestimate the value of (additional) information (VoI; here “information” can be seen as including just doing more thinking, without actually gathering more empirical data)
These critiques are closely related, so I’ll discuss both in this section.
An example of the first of those critiques comes from Chris Smith. Smith discusses one particular method for dealing with “poorly understood uncertainty”, and then writes:
Calling [that method] “making a Bayesian adjustment” suggests that we have something like a general, mathematical method for critical thinking. We don’t.
Similarly, taking our hunches about the plausibility of scenarios we have a very limited understanding of and treating those hunches like well-grounded probabilities can lead us to believe we have a well-understood method for making good decisions related to those scenarios. We don’t.
Many people have unwarranted confidence in approaches that appear math-heavy or scientific. In my experience, effective altruists are not immune to that bias.
An example of (I think) both of those critiques together comes from Daniela Waldhorn:
The existing gaps in this field of research entail that we face significant constraints when assessing the probability that an invertebrate taxon is conscious. In my opinion, the current state of knowledge is not mature enough for any informative numerical estimation of consciousness among invertebrates. Furthermore, there is a risk that such estimates lead to an oversimplification of the problem and an underestimation of the need for further investigation.
I’m somewhat sympathetic to these arguments. But I think it’s very unclear whether arguments about overconfidence and VoI should push us away from rather than towards using EPs; it really seems to me like it could go either way. This is for two reasons.
Firstly, we can clearly represent low confidence in our EPs, by:
using a probability distribution, rather than just a point estimate
giving that distribution (arbitrarily) wide confidence intervals
choosing the shape of that distribution to further represent the magnitude (and nature) of our uncertainty. (See this comment for diagrams.)
conducting sensitivity analyses, which show the extent to which plausible variations in our model’s inputs can affect our model’s outputs
visually representing these probability distributions and sensitivity analyses (which may make our uncertainty more striking and harder to ignore)
Secondly, if we do use EPs (and appropriately wide confidence intervals), this unlocks ways of moving beyond just the general idea that further information would be valuable; it lets us also:
explicitly calculate how valuable more info seems likely to be
identify which uncertainties it’d be most valuable to gather more info on
In fact, there’s an entire body of work on VoI analysis, and a necessary prerequisite for conducting such an analysis is having an EPM.
It does seem plausible to me that, even if we do all of those things, we or others will primarily focus on our (perhaps implicit) point estimate, and overestimate its trustworthiness, just due to human psychology (or EA/rationalist psychology). But that doesn’t seem obvious. Nor does it seem obvious that the overconfidence that may result from using EPs will tend to be greater than the overconfidence that may result from other approaches (like relying on all-things-considered intuitions; recall Kahneman & Klein’s comments from earlier).
And in any case, this whole discussion was easy to have just in terms of very untrustworthy or low-confidence probabilities—there was no need to invoke the idea of a binary risk-uncertainty distinction, or the idea that there are some matters about which we can simply can’t possibly estimate any probabilities.[6]
The optimizer’s curse
Smith gives a “rough sketch” of the optimizer’s curse:
Optimizers start by calculating the expected value of different activities.
Estimates of expected value involve uncertainty.
Sometimes expected value is overestimated, sometimes expected value is underestimated.
Optimizers aim to engage in activities with the highest expected values.
Result: Optimizers tend to select activities with overestimated expected value.
[...] The optimizer’s curse occurs even in scenarios where estimates of expected value are unbiased (roughly, where any given estimate is as likely to be too optimistic as it is to be too pessimistic).
[...] As uncertainty increases, the degree to which the cost-effectiveness of the optimal-looking program is overstated grows wildly.
The implications of, and potential solutions to, the optimizer’s curse seem to be complicated and debatable. For more detail, see this post, Smith’s post, comments on Smith’s post, and discussion of the related problem of Goodhart’s law.
As best I can tell:
The optimizer’s curse is likely to be a pervasive problem and is worth taking seriously.
In many situations, the curse will just indicate that we’re probably overestimating how much better than the alternatives the option we estimate is best is—it won’t indicate that we should actually change what option we pick.
But the curse can indicate that we should pick an option other than that which we estimate is best, if (a) we have reason to believe that our estimate of the value of the best option is more uncertain than our estimate of the value of the other options, and (b) we don’t model that information.
I’ve deliberately kept the above points brief (again, see the above links for further explanations and justifications). This is because those points, while clearly relevant to how to use EPs, are only relevant to when to use EPs (vs alternative approaches) if the optimizer’s curse is a larger problem when using EPs than when using alternative approaches. And I don’t think it necessarily is. For example, Smith notes:
The optimizer’s curse can show up even in situations where effective altruists’ prioritization decisions don’t involve formal models or explicit estimates of expected value. Someone informally assessing philanthropic opportunities in a linear manner might have a thought like:
“Thing X seems like an awfully big issue. Funding Group A would probably cost only a little bit of money and have a small chance leading to a solution for Thing X. Accordingly, I feel decent about the expected cost-effectiveness of funding Group A.
Let me compare that to how I feel about some other funding opportunities…”
Although the thinking is informal, there’s uncertainty, potential for bias, and an optimization-like process. [quote marks added, because I couldn’t double-indent]
This makes a lot of sense to me. But Smith also adds:
Informal thinking isn’t always this linear. If the informal thinking considers an opportunity from multiple perspectives, draws on intuitions, etc., the risk of [overestimating the cost-effectiveness of the optimal-looking program] may be reduced.
I’m less sure what he means by this. I’m guessing he simply means that using multiple, different perspectives means that the various errors and uncertainties are likely to “cancel out” to some extent, reducing the effective uncertainty, and thus reducing the amount by which one is likely to overestimate the value of the best-seeming thing. But if so, it seems that this partial protection could also be achieved by using multiple, different EPMs, making different assumptions in them, getting multiple people to estimate values for inputs, etc.
So ultimately, I think that the problem Smith raises is significant, but I’m quite unsure if it’s a downside of using EPs instead of alternative approaches.
I also don’t think that the optimizer’s curse suggests it’d be valuable to act as if there’s a binary risk-uncertainty distinction. It is clear that the curse gets worse as uncertainty increases (i.e., when one’s probabilities are less trustworthy), but it does so in a gradual, continuous manner. So it seems to me that, again, we’re best off speaking just in terms of more and less trustworthy probabilities, rather than imagining that totally different behaviours are warranted if we’re facing “risk” rather than “Knightian uncertainty”.[7]
Anchoring
Anchoring or focalism is a cognitive bias where an individual depends too heavily on an initial piece of information offered (considered to be the “anchor”) when making decisions. (Wikipedia)
One critique of using EPs, or at least making them public, seems to effectively be that people may become anchored on the EPs given. For example, Jason Schukraft writes:
I contend that publishing specific estimates of invertebrate sentience (e.g., assigning each taxon a ‘sentience score’) would be, at this stage of investigation, at best unhelpful and probably actively counterproductive. [...]
Of course, having studied the topic for some time now, I expect that my estimates would be better than the estimates of the average member of the EA community. If that’s true, then it’s tempting to conclude that making my estimates public would improve the community’s overall position on this topic. However, I think there are at least three reasons to be skeptical of this view.
[One reason is that] It’s difficult to present explicit estimates of invertebrate sentience in a way in which those estimates don’t steal the show. It’s hard to imagine a third party summarizing our work (either to herself or to others) without mentioning lines like ‘Rethink Priorities think there is an X% chance ants have the capacity for valenced experience.’ There are very few serious estimates of invertebrate sentience available, so members of the community might really fasten onto ours.
I think that this critique has substantial merit, but that this is most clear in relation to making EPs public, rather than just in relation to using EPs oneself. As Schukraft writes:
To be clear: I don’t believe it’s a bad idea to think about probabilities of sentience. In fact, anyone directly working on invertebrate sentience ought to be periodically recording their own estimates for various groups of animals so that they can see how their credences change over time.[8]
I expect that one can somewhat mitigate this issue by providing various strong caveats when EPs are quite untrustworthy. And I think somewhat similar issues can also occur when not using EPs (e.g., if just saying something is “very likely”, or giving a general impression of disapproval of what a certain organisation is doing).
But I doubt that caveats would entirely remove the issue.[9] And I’d guess that the anchoring would be worse if using EPs than if not.
Finally, anchoring does seem a more important downside when one’s probabilities are less trustworthy, because then the odds people will be anchored to a bad estimate are higher. But again, it seems easy, and best, to think about this in terms of more and less trustworthy probabilities, rather than in terms of a binary risk-uncertainty distinction.
Reputational issues
Finally, in the same post, Schukraft notes another issue with using EPs:
Sentience scores might reduce our credibility with potential collaborators
[....] science, especially peer-reviewed science, is an inherently conservative enterprise. Scientists simply don’t publish things like probabilities of sentience. For a long time, even the topic of nonhuman sentience was taboo because it was seen as unverifiable. Without a clear, empirically-validated methodology behind them, such estimates would probably not make it into a reputable journal. Intuitions, even intuitions conditioned by careful reflection, are rarely admitted in the court of scientific opinion.
Rethink Priorities is a new, non-academic organization, and it is part of a movement that is—frankly—sort of weird. To collaborate with scientists, we first need to convince them that we are a legitimate research outfit. I don’t want to make that task more challenging by publishing estimates that introduce the perception that our research isn’t rigorous. And I don’t think that perception would be entirely unwarranted. Whenever I read a post and encounter an overly precise prediction for a complex event (e.g., ‘there is a 16% chance Latin America will dominate the plant-based seafood market by 2025’), I come away with the impression that the author doesn’t sufficiently appreciate the complexity of the forces at play. There may be no single subject more complicated than consciousness. I don’t want to reduce that complexity to a number.
Some of my thoughts on this potential downside mirror those I made with regards to anchoring:
This does seem like it would often be a real downside, and worth taking seriously.
This seems most clearly a downside of making EPs public, rather than of using EPs in one’s own thinking (or within a specific organisation or community).
This downside does seem more prominent the less trustworthy one’s probabilities would be.
But unlike all the other downsides I’ve covered, this one does seem like it might warrant acting (in public) as if there is a binary risk-uncertainty distinction. This is because the people one wants to maintain a good reputation with may think there is such a distinction (or effectively think as if that’s true). But it should be noted that this only requires publicly acting as if there’s such a distinction; you don’t have to think as if there’s such a distinction.
One last thing to note is that it also seems possible that similar reputational issues could result from not using EPs. For example, if one relies on qualitative or intuitive approaches, one’s thinking may be seen as “hand-wavey”, “soft”, and/or imprecise by people from a more “hard science” background.
Conclusions
-
There are some real downsides that can occur in practice when actual humans use EPs (or EPMs, or maximising expected utility)
-
But some downsides that have been suggested (particularly causing overconfidence and understating the VoI) might actually be more pronounced for approaches other than using EPs
-
Some downsides (particularly relating to the optimizer’s curse, anchoring, and reputational issues) may be more pronounced when the probabilities one has (or could have) are less trustworthy
-
Other downsides (particularly excluding one’s intuitive knowledge) may be more pronounced when the probabilities one has (or could have) are more trustworthy
-
Only one downside (reputational issues) seems to provide any argument for even acting as if there’s a binary risk-uncertainty distinction
And even in that case the argument is quite unclear, and wouldn’t suggest we should use the idea of such a distinction in our own thinking
-
The above point, combined with arguments I made in an earlier post, makes me believe that we should abandon the concept of the risk-uncertainty distinction in our own thinking (and at least most communication), and that we should think instead in terms of:
a continuum of more to less trustworthy probabilities
the practical upsides and downsides of using EPs, for actual humans.
I’d be interested in people’s thoughts on all of the above; one motivation for writing this post was to see if someone could poke holes in, and thus improve, my thinking.
- ↩︎
I should note that this post basically takes as a starting assumption the Bayesian interpretation of probability, “in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief” (Wikipedia). But I think at least a decent amount of what I say would hold for other interpretations of probability (e.g., frequentism).
- ↩︎
Of course, I could quickly and easily make an extremely simplistic EPM, or use just a single EP. But then it’s unclear if that’d do better than similarly quick and easy alternative approaches, for the reasons discussed in the following sections. For a potentially contrasting perspective, see Using a Spreadsheet to Make Good Decisions: Five Examples.
- ↩︎
This seems analogous to the idea that utilitarianism itself may often recommend against the action of trying to explicitly calculate which action would be recommended by utilitarianism (given that that’s likely to slow one down massively). Amanda Askell has written a post on that topic, in which she says: “As many utilitarians have pointed out, the act utilitarian claim that you should ‘act such that you maximize the aggregate wellbeing’ is best thought of as a criterion of rightness and not as a decision procedure. In fact, trying to use this criterion as a decision procedure will often fail to maximize the aggregate wellbeing. In such cases, utilitarianism will actually say that agents are forbidden to use the utilitarian criterion when they make decisions.”
- ↩︎
Along similar lines, Holden Karnofsky (of GiveWell, at the time) writes: “It’s my view that my brain instinctively processes huge amounts of information, coming from many different reference classes, and arrives at a prior; if I attempt to formalize my prior, counting only what I can name and justify, I can worsen the accuracy a lot relative to going with my gut.”
- ↩︎
This is different to the idea that people may tend to overestimate EPs, or overestimate cost-effectiveness, or things like that. That claim is also often made, and is probably worth discussing, but I leave it out of this post. Here I’m focusing instead on the separate possibility of people being overconfident about the accuracy of whatever estimate they’ve arrived at, whether it’s high or low.
- ↩︎
Here’s Nate Soares making similar points: “In other words, even if my current credence is 50% I can still expect that in 35 years (after encountering a black swan or two) my credence will be very different. This has the effect of making me act uncertain about my current credence, allowing me to say “my credence for this is 50%” without much confidence. So long as I can’t predict the direction of the update, this is consistent Bayesian reasoning.
As a bounded Bayesian, I have all the behaviors recommended by those advocating Knightian uncertainty. I put high value on increasing my hypothesis space, and I often expect that a hypothesis will come out of left field and throw off my predictions. I’m happy to increase my error bars, and I often expect my credences to vary wildly over time. But I do all of this within a Bayesian framework, with no need for exotic “immeasurable” uncertainty.”
- ↩︎
Smith’s own views on this point seem a bit confusing. At one point, he writes: “we don’t need to assume a strict dichotomy separates quantifiable risks from unquantifiable risks. Instead, real-world uncertainty falls on something like a spectrum.” But at various other points, he writes things like “The idea that all uncertainty must be explainable in terms of probability is a wrong-way reduction [i.e., a bad idea; see his post for details]”, and “I don’t think ignorance must cash out as a probability distribution”.
- ↩︎
While I think this is a good point, I also think it may sometimes be worth considering the risk that one might anchor oneself to one’s own estimate. This could therefore be a downside of even just generating an EP oneself, not just of making EPs public.
- ↩︎
I briefly discuss empirical findings that are somewhat relevant to these points here.
- Database of existential risk estimates by 15 Apr 2020 12:43 UTC; 130 points) (
- 18 Feb 2023 15:03 UTC; 73 points) 's comment on People Will Sometimes Just Lie About You by (
- Some thoughts on Toby Ord’s existential risk estimates by 7 Apr 2020 2:19 UTC; 67 points) (
- Epistemic status: an explainer and some thoughts by 31 Aug 2022 13:59 UTC; 59 points) (
- Failures in technology forecasting? A reply to Ord and Yudkowsky by 8 May 2020 12:41 UTC; 44 points) (LessWrong;
- Evaluating expertise: a clear box model by 15 Oct 2020 14:18 UTC; 36 points) (LessWrong;
- Doing Better on Climate Change by 7 Oct 2022 17:22 UTC; 24 points) (
- Database of existential risk estimates by 20 Apr 2020 1:08 UTC; 24 points) (LessWrong;
- 23 Aug 2020 8:26 UTC; 9 points) 's comment on The case of the missing cause prioritisation research by (
- Risk and uncertainty: A false dichotomy? by 18 Jan 2020 3:09 UTC; 6 points) (LessWrong;
- 22 Jul 2020 0:59 UTC; 4 points) 's comment on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher by (
- 7 Mar 2021 7:44 UTC; 3 points) 's comment on Strong Longtermism, Irrefutability, and Moral Progress by (
- 24 Nov 2022 10:48 UTC; 3 points) 's comment on A case against focusing on tail-end nuclear war risks by (
- 9 Mar 2020 16:28 UTC; 3 points) 's comment on Causal diagrams of the paths to existential catastrophe by (
- 8 Jun 2021 8:40 UTC; 2 points) 's comment on EA Infrastructure Fund: Ask us anything! by (
- 21 Mar 2021 3:07 UTC; 2 points) 's comment on AMA: Tom Chivers, science writer, science editor at UnHerd by (
- 21 Jan 2020 1:03 UTC; 1 point) 's comment on Risk and uncertainty: A false dichotomy? by (LessWrong;
- 20 Jan 2020 2:31 UTC; 1 point) 's comment on Risk and uncertainty: A false dichotomy? by (LessWrong;
- 6 Feb 2020 8:41 UTC; 1 point) 's comment on Can we always assign, and make sense of, subjective probabilities? by (LessWrong;
- 20 Jan 2020 2:37 UTC; 1 point) 's comment on Making decisions when both morally and empirically uncertain by (LessWrong;
Kudos for this write-up, and for your many other posts (both here and on LessWrong, it seems) on uncertainty.
Overall, I’m very much in the “Probabilities are pretty great and should eventually be used for most things” camp. That said, I think the “Scout vs. Soldier” mindset is useful, so to speak; investigating both sides is pretty useful. I’d definitely assign some probability to being wrong here.
My impression is that we’re probably in broad agreement here.
Some quick points that come to mind:
The debate on “are explicit probabilities useful” is very similar to those of “are metrics useful”, “are cost-benefit analyses useful”, and “is consequentialist reasoning useful.” I expect that there’s broad correlation between those who agree/disagree with these.
In cases where probabilities are expected to be harmfully, hopefully probabilities could be used to tell us as such. Like, we could predict that explicit and public use would be harmful.
I’d definitely agree that it’s very possible to use probabilities poorly. I think a lot of Holden’s criticisms here would fall into this camp. Neural Nets for a while were honestly quite poor, but thankfully that didn’t lead to scientists abandoning those. I think probabilities are a lot better now, but we could learn to get much better than them later. I’m not sure how we can get much better without them.
The optimizer’s curse can be adjusted for with reasonable use of Bayes. Bayesian hierarchical models should deal with it quite well. There’s been some discussion of this around “Goodhart” on LessWrong.
I think I agree with pretty much all of that. And I’d say my position is close to yours, though slightly different; I might phrase mine like: “My understanding is that probabilities should always be used by ideal, rational agents with unlimited computational abilities etc. (Though that’s still slightly ‘received wisdom’ for me.) And I also think that most people, and perhaps even most EAs and rationalists, should use probabilities more often. But I doubt they should actually be used for most tiny decisions, by actual humans. And I think they’ve sometimes been used with far too little attention to their uncertainty—but I also think that this really isn’t an intrinsic issues with probabilities, and that intuitions are obviously also very often used overconfidently.”
(Though this post wasn’t trying to argue for that view, but rather to explore the potential downsides relatively neutrally and just see what that revealed.)
I’m not sure I know what you mean by the following two statements: “Probabilities [...] should eventually be used for most things” and “I think probabilities are a lot better now, but we could learn to get much better than them later.” Could you expand on those points? (E.g., would you say we should eventually use probabilities even the 100th time we make the same decision as before about what to put in our sandwiches?)
Other points:
1. Yes, I share that view. But I think it’s also interesting to note it’s not a perfect correlation. E.g. Roser writes:
2. Yes, I agree. Possibly I should’ve emphasised that more. I allude to a similar point with “It seems the expected value of me bothering to do this EPM is lower than the expected value of me just reading a few reviews and then “going with my gut” (and thus saving time for other things)”, and the accompanying footnote about utilitarianism.
4. I think I’ve seen what you’e referring to, e.g. in lukeprog’s post on the optimizer’s curse. And I think the basic idea makes sense to me (though not to the extent I could actually act on it right away if you handed me some data). But Chris Smith quotes the proposed solution, and then writes:
That seems to me like at least a reason to expect the proposed solution to not work very well. My guess would be that we can still use our best guesses to make adjustments (e.g., just try to quantify our vague sense that a randomly chosen charity wouldn’t be very cost-effective), but I don’t think I understand the topic well enough to speak on that, really.
(And in any case, I’m not sure it’s directly relevant to the question of whether we should use EPs anyway, because, as covered in this post, it seems like the curse could affect alternative approaches too, and like the curse doesn’t mean we should abandon our best guess, just that we should be more uncertain about it.)
Hm… Some of this would take a lot more writing than would make sense in a blog post.
On overconfidence in probabilities vs. intuitions: I think I mostly agree with you. One cool thing about probabilities is that they can be much more straightforwardly verified/falsified and measured using metrics for calibration. If we had much larger systems, I believe we could do a great deal of work to better ensure calibration with defined probabilities.
I’m not saying that humans should come up with unique probabilities for most things on most days. One example I’d consider “used for most things” is a case where an AI uses probabilities to tell humans which actions seem the best, and humans go with what the AI states. Similar could be said for “a trusted committee” that uses probabilities as an in-between.
I think there are strong claims that topics like Bayes, Causality, Rationality even, are still relatively poorly understood, and may be advanced a lot in the next 30-100 years. As we get better with them, I predict we would get better at formal modeling.
This is a complicated topic. It think a lot of Utilitarians/Consequentialists wouldn’t deem many interpretations of rights as metaphysical or terminally-valuable things. Another way to look at it would be to attempt to map the rights to a utility function. Utility functions require very, very few conditions. I’m personally a bit cynical of values that can’t be mapped to utility functions, if even in a highly-uncertain way.
Kudos for identifing that post. The main solution I was referring to was the one described in the second comment:
The optimizer’s curse arguably is basically within the class of Goodhart-like problems https://www.lesswrong.com/posts/5bd75cc58225bf06703754b2/the-three-levels-of-goodhart-s-curse
I’m not saying that these are easy to solve, but rather, there is a mathematical strategy to generally fix them in ways that would make sense intuitively. There’s no better approach than to try to approximate the mathematical approach, or go with an approach that in-expectation does a decent job at approximating the mathematical approach.
That all seems to make sense to me. Thanks for the interesting reply!
Just found this post, coming in to comment a year late—Thanks Michael for the thoughtful post and Ozzie for the thoughtful comments!
I might agree with you about what’s (in some sense) mathematically possible (in principle). In practice, I don’t think people trying to approximate the ideal mathematical approach are going to have a ton of success (for reasons discussed in my post and quoted in Michael’s previous comment).
I don’t think searching for “an approach that in-expectation does a decent job at approximating the mathematical approach” is pragmatic.
In most important scenarios, we’re uncertain what approaches work well in-expectation. Our uncertainty about what works well in-expectation is the kind of uncertainty that’s hard to hash out in probabilities. A strict Bayesian might say, “That’s not a problem—with even more math, the uncertainty can be handled....”
While you can keep adding more math and technical patches to try and ground decision making in Bayesianism, pragmatism eventually pushes me in other directions. I think David Chapman explains this idea a hell of a lot better than I can in Rationalism’s Responses To Trouble.
Getting more concrete:
Trusting my gut or listening to domain experts might turn out to be approaches that work well in some situation. If one of these approaches works, I’m sure someone could argue in hindsight that an approach works because it approximates an idealized mathematical approach. But I’m skeptical of the merits of work done in the reverse (i.e., trying to discover non-math approaches by looking for things that will approximate idealized mathematical approaches).
Hmm, I feel like you may be framing things quite differently to how I would, or something. My initial reaction to your comment is something like:
It seems usefully to conceptually separate data collection from data processing, where by the latter I mean using that data to arrive at probability estimates and decisions.
I think Bayesianism (in the sense of using Bayes’ theorem and a Bayesian interpretation of probability) and “math and technical patches” might tend to be part of the data processing, not the data collection. (Though they could also guide what data to look for. And this is just a rough conceptual divide.)
When Ozzie wrote about going with “an approach that in-expectation does a decent job at approximating the mathematical approach”, he was specifically referring to dealing with the optimizer’s curse. I’d consider this part of data processing.
Meanwhile, my intuitions (i.e., gut reactions) and what experts say are data. Attending to them is data collection, and then we have to decide how to integrate that with other things to arrive at probability estimates and decisions.
I don’t think we should see ourselves as deciding between either Bayesianism and “math and technical patches” or paying attention to my intuitions and domain experts. You can feed all sorts of evidence into Bayes theorem. I doubt any EA would argue we should form conclusions from “Bayesianism and math alone”, without using any data from the world (including even their intuitive sense of what numbers to plug in, or whether people they share their findings with seem skeptical). I’m not even sure what that’d look like.
And I think my intuitions or what domain experts says can very easily be made sense of as valid data within a Bayesian framework. Generally, my intuitions and experts are more likely to indicate X is true in worlds where X is true than where it’s not. This effect is stronger when the conditions for intuitive expertise are met, when experts’ incentives seem to be well aligned with seeking and sharing truth, etc. This effect is weaker when it seems that there are strong biases or misaligned incentives at play, or when it seems there might be.
(Perhaps this is talking past you? I’m not sure I understood your argument.)
I largely agree with what you said in this comment, though I’d say the line between data collection and data processing is often blurred in real-world scenarios.
I think we are talking past each other (not in a bad faith way though!), so I want to stop myself from digging us deeper into an unproductive rabbit hole.
Hey Michael, thanks for having written this post.
Some other downsides and thoughts:
One thing with explicit CEAs is that they seem to me that they might be useless at the beginning, until they are refined, shared, criticized, tested, etc. E.g., I don’t know what the time-to-beat-intuition is.
There is a super-category for the “reputational issues”, but I don’t really know what to call it. Maybe “effects on others.” Other members of that category might be:
Explicit probabilities might hurt community building because they put off people who don’t like them.
Explicit probabilities might show that previous undertakings which people thought were worth it are not, but stopping those undertakings would be bad for movement-building/people’s egos.
E.g., it turns out that the pathway to impact of giving out EA books is on keeping the movement builders motivated, but they don’t end up being very motivated if they think that it otherwise has ~0 impact.
One danger with explicit models might be that they end up being fetishized or implemented in a cargo-cult kind way, where they are done without understanding the purpose of their existence.
Ok, I’ll flag this too. I’m sure there are statistical situations where an extreme outcome implies that an adjustment for correlation goodharting would make it seem worse than other options; i.e. change order.
That said, I’d guess this isn’t likely to happen that often for realistic cases, especially when there aren’t highly extreme outliers (which, to be fair, we do have with EA).
I think one mistake someone could make here would be to say that because the ordering may be preserved, the problem wouldn’t be “fixed” at all. But, the uncertainties and relationships themselves are often useful information outside of ordering. So a natural conclusion in the case of intense noise (which leads to the optimizer’s curse) would be to accept a large amount of uncertainty, and maybe use that knowledge to be more conservative; for instance, trying to get more data before going all-in on anything in particular.
Yeah, I think all of that’s right. I ended up coincidentally finding my way to a bunch of stuff about Goodhart on LW that I think is what you were referring to in another comment, and I’ve realised my explanation of the curse moved too fast and left out details. I think I was implicitly imagining that we’d already adjusted for what we know about the uncertainties of the estimates of the different options—but that wasn’t made clear.
I’ve now removed the sentence you quote (as I think it was unnecessary there anyway), and changed my earlier claims to:
Now, that’s not very clear, but I think it’s more accurate, at least :D
I think that makes sense. Some of it is a matter of interpretation.
From one perspective, the optimizer’s curse is a dramatic and challenging dilemma facing modern analysis. From another perspective, it’s a rather obvious and simple artifact from poorly-done estimates.
I.E. they sometimes say that if mathamaticians realize something is possible, they consider the problem trivial. Here the optimizer’s curse is considered a reasonably-well-understood phenomena, unlike some other estimation-theory questions currently being faced.
See also Holden Karnofsky’s recent post Bayesian Mindset.
(If people stumble upon this in future, I’d also recommend reading Greg Lewis’ interesting Use resilience, instead of imprecision, to communicate uncertainty.)
Some related things that come to mind:
Challenges to Bayesian Confirmation Theory outlines some conceptual potential issues arising from the use of explicit probabilities in a Bayesian framework.
Gerd Gigerenzer likes to claim that “fast and frugal” heuristics often just perform better than more formal, quantitative models. These claims can be linked to the bias-variance tradeoff and extreme priors.
The optimizer’s curse can be generalized to the satisficer’s curse. This generalization doesn’t obviously seem to differentially affect explicit probabilities though.
Thanks for these links. I know a little about the satisficer’s curse, and share the view that “This generalization doesn’t obviously seem to differentially affect explicit probabilities though.” Hopefully I’ll have time to look into the other two things you mention at some point.
(My kneejerk reaction to “”fast and frugal” heuristics often just perform better than more formal, quantitative models” is that if it’s predictable that a heuristic would result in more accurate answers, even if we imagine we could have unlimited time for computations or whatever, then that fact, and ideally whatever causes it, can just be incorporated into the explicit model. But that’s just a kneejerk reaction. And in any case, if he’s just saying that in practice heuristics are often better, then I totally agree.)
I’m not very sure, but I imagine that the Optimizer’s curse might result in a reason against maximizing expected utility (though I’d distinguish it from using explicit probability models in general) if we’re dealing with a bounded budget—in which case, one might prefer a suboptimal option with low variance...?
(Plus, idk if this is helpful: in social contexts, a decision rule might incorporate the distribution of the cognitive burdens—I’m thinking about Prudence in Accounting, or maybe something like a limited precautionary principle. If you use an uninformative prior to assess a risk / liability / asset of a company, it might be tempted to hide information)
I now believe the statement of mine you quote was incorrect, and I’ve updated the optimizer’s curse section, primarily to remove the sentence you quoted (as I think it’s unnecessary in any case) and to alter an earlier part where I made a very similar claim so that it now says:
(I think I already knew this but just previously didn’t explain it properly, leaving the conditions I had in mind as assumed, even though they often won’t hold in practice.)
But I think this updated version doesn’t address the points you make. From “if we’re dealing with a bounded budget—in which case, one might prefer a suboptimal option with low variance”, it sounds to me like maybe what you’re getting at is risk-aversion and/or diminishing returns to a particular thing?
For example, let’s say I can choose either A, which gives me $1 thousand in expectation, or B, which gives me $1 million in expectation. So far, B obviously seems way better. But what if B is way higher uncertainty (or way higher risk, if one prefers that phrasing)? Then maybe I’d prefer A.
I’d personally consider this biased if it’s pure risk-aversion, and the dollar values perfectly correspond to my “utility” from this. But in reality, each additional dollar is less valuable. For example, perhaps I’m broke, and by far the most important thing is that I get $1000 to get myself out of a real hole—a quite low chance of much higher payoffs isn’t worth it, because I get far less than 1000 times as much value out of 1000 times as much money.
If that’s what you were getting at, I think that’s all valid, and I think the optimizer’s curse does probably magnify those reasons to sometimes not go with what you estimate will give you, in expectation, the most of some thing you value. But I think really that doesn’t depend on the optimizer’s curse, and is more about uncertainty in general. Also, I think it’s really important to distinguish “maximising expected utility” from “maximising expected amount of some particular thing I value”. My understanding is that “risk-aversion” based on diminishing returns to dollars, for example, can 100% make sense within expected utility maximisation—it’s only pure risk-aversion (in terms of utility itself) that can’t.
(Let me know if I was totally misunderstanding you.)
I am very satisfied with the new text. I think you understood me pretty well; the problem is, I was a little bit unclear and ambiguous.
I’m not sure if this impacts your argument: I think diminishing returns accounts pretty well for saturation (ie., gaining $1 is not as important as losing $1); but it’s plausible to complement subjective expected utility theory with pure risk-aversion, like Lara Buchak does.
But what I actually had in mind is something like, in the extreme for unbounded utility, St. Petersburg paradox: if you’re willing to constantly bet all your budget, you’ll sure end up with $0 and bankrupt. In real life, I guess that if you were constantly updating your marginal utility per dollar, this wouldn’t be a problem (so I agree with you—this is not a challenge to expected utility maximisation).
Yeah, I’ve seen mentions of Buchak’s work and one talk from her, but didn’t really get it, and currently (with maybe medium confidence?) still think that, when talking about utility itself, and thus having accounted for diminishing returns and all that, one should be risk-neutral.
I hadn’t heard of martingales, and have relatively limited knowledge of the St Petersburg paradox. It seems to me (low confidence) that:
things like the St Petersburg paradox and Pascal’s mugging are plausible candidates for reasons to reject standard expected utility maximisation, at least in certain edge cases, and maybe also expected value reasoning
Recognising that there are diminishing returns to many (most?) things at least somewhat blunts the force of those weird cases
Things like accepting risk aversion or rounding infinitemal probabilities to 0 may solve the problems without us having to get rid of expected value reasoning or entirely get rid of expected utility maximisation (just augment it substantially)
There are some arguments for just accepting as rational what expected utility maximisation says in these edge cases—it’s not totally clear that our aversion to the “naive probabilistic” answer here is valid; maybe that aversion just reflects scope neglect, or the fact that, in the St Petersburg case, there’s the overlooked cost of it potentially taking months of continual play to earn substantial sums
I don’t think these reveal problems with using EPs specifically. It seems like the same problems could occur if you talked in qualitative terms about probabilities (e.g., “at least possible”, “fairly good odds”), and in either case the “fix” might look the same (e.g., rounding down either a quantitative or qualitative probability to 0 or to impossibility).
But it does seem that, in practice, people not using EPs are more likely to round down low probabilities to 0. This could be seen as good, for avoiding Pascal’s mugging, and/or as bad, for a whole host of other reasons (e.g., ignoring many x-risks).
Maybe a fuller version of this post would include edge cases like that, but I know less about them, and I think they could create “issues” (arguably) even when one isn’t using explicit probabilities anyway.
I mostly agree with you. I subtracted the reference to martingales from my previous comment because: a) not my expertise, b) this discussion doesn’t need additional complexity.
I’m sorry for having raised issues about paradoxes (perhaps there should be a Godwin’s Law about them); I don’t think we should mix edge cases like St. Petersburg (and problems with unbounded utility in general) with the optimizer’s curse – it’s already hard to analyze them separately.
Pace Buchak, I agree with that, but I wouldn’t say it aloud without adding caveats: in the real world, our problems are often of dynamic choice (and so one may have to think about optimal stopping and strategies, information gathering, etc.), we don’t observe utility-functions, we have limited cognitive resources, and we are evaluated and have to cooperate with others, etc. So I guess some “pure” risk-aversion might be a workable satisficing heuristics to [signal you] try to avoid the worst outcomes when you can’t account for all that. But that’s not talking about utility itself—and certainly not talking probability / uncertainty itself.
In line with the spirit of your comment, I believe, I think that it’s useful to recognise that not all discussions related to pros and cons of probabilities or how to use them or that sort of thing can or should address all potential issues. And I think that it’s good to recognise/acknowledge when a certain issue or edge case actually applies more broadly than just to the particular matter at hand (e.g., how St Petersburg is relevant even aside from the optimizer’s curse). An example of roughly the sort of reasoning I mean with that second sentence, from Tarsney writing on moral uncertainty:
But I certainly don’t think you need to apologise for raising those issues! They are relevant and very worthy of discussion—I just don’t know if they’re in the top 7 issues I’d discuss in this particular post, given its intended aims and my current knowledge base.
Oh, I only apologised because, well, if we start discussing about catchy paradoxes, we’ll soon lose the track of our original point.
But if you enjoy it, and since it is a relevant subject, I think people use 3 broad “strategies” to tackle St. Petersburg paradoxes and the like:
[epistemic status: low, but it kind makes sense]
a) “economist”: “if you use a bounded version, or takes time into account, the paradox disappears: just apply a logarithmic function for diminishing returns...”
b) “philosopher”: “unbounded utility is weird” or “beware, it’s Pascal’s Wager with objective probabilities!”
c) “statistician”: “the problem is this probability distribution, you can’t apply central limit / other theorem, or the indifference principle, or etc., and calculate its expectation”